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Abstract 

In this paper we place our arguments on two related issues 
in the design of generalized structured peer-to-peer overlays. 
First, we argue that for the large-scale content-sharing ap- 
plications, lookup and content transport functions need to be 
treated separately. Second, to create a location-based rout- 
ing overlay suitable for content sharing and other applica- 
tions, we argue that off-the-shelf geographic coordinates of 
Internet-connected hosts can be used as a basis. We then 
outline the design principles and present a design for the 
generalized routing overlay based on adaptive hierarchical 
partitioning of the geographical space. 

1 Introduction 

Peer-to-peer overlay networks are nowadays envisioned 
as a single self-organized and decentralized substrate to be 
used by many different large-scale networked applications. 
Since inception of the concept, a large number of alternative 
designs for peer-to-peer overlays have been proposed to sup- 
port different application needs. At some point, consensus 
has emerged that there is a need for a single well-defined in- 
terface to encapsulate the generic features provided by these 
overlays. Progress have been made in this direction and sev- 
eral well-understood features have been defined [10, 21]. 

Because the evolution of peer-to-peer research has his- 
torically depended on the content sharing applications, two 
fundamental functional requirements of these applications - 
search and transport have guided the design of general pur- 
pose overlays. Initially, the problem of search or lookup 
dominated the research while the transport related prob- 
lems gradually emerged. Researchers then have attempted 
to accommodate transport related functionalities in general- 
ized versions of the overlays that were primarily created for 
lookup [7]. Also, randomness was introduced in the over- 
lay structures for several puposes such as load balancing and 
source anonymity [23, 24]. As a result, the conventional wis- 
dom of routing locality in bulk data transport necessary for 
efficient usage of network resources has been ignored. 

With the rising of complaints from the ISPs against the 
peer-to-peer traffic, there have been several proposals for in- 
troducing locality awareness in the overlay structures [29, 16, 
20, 28, 26]. In most of the cases, localities are defined based 
on explicit measurements of some application level metric 
such as latency. This class of overlays, denoted as network 
aware overlays suffer from the large background overhead of 
the measurement. 

In this paper, we argue that the geographic location of 



the end-hosts, at the available granularity of ISP's points of 
presence, can be used as the basis of a locality-based routing 
overlay. The argument is founded on two observations. First, 
the Internet infrastructure has significant geographic cluster- 
ing and hierarchical organization [13, 14], and second, the 
major fraction of the acquaintances in online social network- 
ing communities are dictated by geographic proximity [17]. 
Thus, if we expect that social interaction will dominate the 
cyber-traffic in near future, from sending messages, emails 
and blogs to sharing videos, photos and musics, geography 
can be used as the basis for overlay routing structure that 
would provide the desired locality properties. 

The main contributions of this paper are the arguments 
in favor of our positions in two related issues - whether the 
overlay support for content transport should be treated sep- 
arately from the content lookup, and whether the off-the- 
shelf geographic coordinates can be used for constructing the 
location-based routing overlay. In the line of our arguments, 
we have outlined the design principles for the transport over- 
lay and presented an overlay design based on adaptive hier- 
archical partitioning of the geographical space. 

In Section 2, the history of peer-to-peer research is an- 
alyzed to demonstrate the need for separation of the over- 
lay supports for lookup and transport. Section 3 outlines 
the design goals of an overlay structure for large-scale con- 
tent transport. Section 4 places the arguments for using ge- 
ographic coordinates in a location-based overlay. Section 5 
gives a brief description of the proposed overlay structure and 
its routing techniques. How the structure would be useful in 
content-sharing applications is explained in Section 6. 

2 Development of P2P Overlays in Retrospect: 
Separation of Lookup and Transport in 
Content Sharing 

Although many different applications have been cited that 
could use peer-to-peer overlays [10], the major driving force 
behind the design of almost all the overlays is the single 
most popular application of content sharing - that allows a 
huge number of Internet-connected end-hosts to participate 
in sharing of large data contents Uke software packages, me- 
dia files or live audio/video streams. 

Two related but subtly distinct necessities of this content- 
sharing application influenced the design of the peer-to-peer 
overlay networks. The first need was indexing and search - 
how to quickly find the physical location of a specific content 
when a description or a name is given. The second need was 
data transport - how the content can be efficiently transported 
when a large number of hosts show interest in the same con- 
tent, either at the same time, or asynchronously over a pro- 
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longed period of time. 

Several generic overlays attempted to provide both the 
lookup and transport features using the same message rout- 
ing infrastructure [21, 7]. Key -based routing has been pro- 
posed as the standard service interface that can be used to 
derive all the necessary functions for the content sharing ap- 
pUcation [10]. It may be observed that the initial designs 
of the overlay structures were driven by the goal of efficient 
lookup. The necessity of a transport structure for concurrent 
or nearly concurrent transportation of content to multiple re- 
ceivers came as a secondary thought. As such, the designers 
of the lookup overlays attempted to use the same infrastruc- 
ture they created for lookup for the purpose of content rout- 
ing [7]. 

However, it has later been understood that the optimiza- 
tion objective of the transport overlay is grossly different 
from that of the lookup overlay [8, 18]. The lookup needs fast 
response while transportation needs efficient use of underly- 
ing network links. At some point, it has also been argued 
that, based on the current hardware capacities and the pos- 
sible scale of the overlay networks in foreseeable future, the 
fancy designs of multi-hop structured routing overlays for the 
lookup service are unnecessary complexities [22]. Although 
there have been attempts to propose some hybrid infrastruc- 
tures [8] that cover optimization objectives for both lookup 
and transport, we argue that they are fundamentally differ- 
ent, and hence it is beneficial to separate these features from 
the ground level of the architecture. 

A common design decision taken by the designers of the 
overlay structures, primarily designed for the lookup service, 
is the randomly assigned flat identifiers for the hosts, while 
the overlay structures are defined in terms of the numeric 
properties of the identifiers. The arguments placed in favor 
of such randomness include load balancing, placing of repli- 
cas at uncorrelated hosts and anonymizing the source of a 
content. While accepting that all these features are necessary 
for content sharing, researchers have argued that randomized 
placement of hosts in the overlay structure is not the only 
way, nor the best way, of achieving them [30]. Moreover, our 
position is that emphasis on such features should not preclude 
the conventional wisdom of routing locahty in high- volume 
content transportation. 

Ever since the emergence of the peer-to-peer content shar- 
ing application, there have been growing complaints and con- 
sequent policing from the ISPs on the traffic generated by 
peer-to-peer applications. Though part of the reason of this 
overwhelming traffic is the sheer volume of the contents, 
we beheve that the on-purpose randomized message routing 
topologies of most of the peer-to-peer overlays also shares 
part of the blame. 

3 Design Principles for the Transport Overlay 

In this paper we focus on the routing overlay that is pri- 
marily used for transport of bulk data content. The Internet 
Protocol is sufficiently optimized for carrying data packets 
between two endpoints in the network. However, in the large 



scale content sharing application that dominates the peer-to- 
peer world, the same content is transported to a large number 
of end-points, either at the same time or asynchronously over 
an extended period of time. Thus, either in strong or in loose 
sense, the necessity of these content sharing appUcation is an 
overlay that can support construction of efficient multicast 
trees. 

For bulk data transport, whether unicast or multicast, the 
primary optimization goal in choosing the transport paths is 
the efficient use of the network resources. Low latency path 
of transport is desirable and may even be necessary in some 
applications, although it comes secondary to the resource ef- 
ficiency in case of bulk-transport applications. 

The scale of the systems demands achieving these objec- 
tives through decentralized decisions of routing. The prob- 
lem has been thoroughly studied in the realm of IP networks. 
It is understood that if some simple principles are followed 
in local decisions, the desired global properties emerge. 

One such well-known principle is the principle of locality, 
which requires that the transport path between two endpoints 
of the same local region should remain within the region [11]. 
This discourages the traffic to take arbitrary detours causing 
unnecessary burden on the global network. The same princi- 
ple also yields low latency and high reliability paths. 

Another locality property that results in efficient re- 
source usage in multicasting is the path-convergence prop- 
erty, which states that paths from a single source to multiple 
destinations in one locality should have significant portion 
of the path shared. The smaller the area of the locality, the 
larger should be the common segment. Intuitively, this can 
be attained, if the localities are hierarchically divided, and 
the traffic follows a direction towards destination, gradually 
resolving the destination at a deeper level of the hierarchy. 
Such directional routing with hierarchical resolution will be 
explained in further details in Section 5. 

There are other design objectives that are common in all 
peer-to-peer overlays, to account for the sheer scale and the 
dynamics of overlay membership. The overlay structure 
should be adaptive and should easily accommodate growth 
and shrinkage of the membership pool. The overhead for 
managing the structure must be low. 

4 Location Awareness: can Geography Help? 

As we understand from the discussion on the overlay de- 
sign principles in Section 3, the overlay must take into ac- 
count the physical location of the hosts and the network links 
with respect to each other while routing traffic. Indeed, sev- 
eral structured overlays have been designed that base their 
routing decisions on location. They differ in the way they 
represent and utilize the location information. 

Pietzuch et al. in [20] classify location-based overlays in 
two classes - proactive and reactive ones. The reactive loca- 
tion based overlays, such as Meridian [27], take explicit mea- 
surement of location immediately before taking each rout- 
ing decision. Such measurement provides fresh and more 
correct information but the overhead is large when the sys- 
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tem is loaded. The proactive location based overlays use 
some background mechanism to measure the relative loca- 
tions of nodes and use that information for routing decisions. 
Here the overhead depends on the dynamics of overlay mem- 
bership rather than system usage. The location information 
is represented either implicitly in the choice of the overlay 
neighbors [4] or expUcitly by placing nodes in a virtual coor- 
dinate space [26]. 

Positioning the Internet hosts in a virtual coordinate space, 
called Network Coordinates, has been an active research is- 
sue for several years, due to the perceived usefulness of such 
coordinates in solving several distributed system problems 
such as resource discovery, replica placement and efficient 
routing. The usual choice of the coordinate space is a multi- 
dimensional Cartesian space, and research shows that the In- 
ternet hosts can be mapped into a low-dimensional Cartesian 
space with acceptable accuracy [19]. Notable projects that 
approach such mappings include GNP [12] and Vivaldi [9J. 
Nevertheless, high background overhead inhibits the accept- 
ability of network coordinates for many useful applications. 

Here we argue that, for the purpose of overlay routing ad- 
hering to the locality principle, two dimensional geographic 
coordinates (latitude, longitude) of the hosts would provide 
sufficient location information. At present, there are a num- 
ber of comprehensive databases [3, 1, 2] that resolve the ge- 
ographic coordinates of Internet hosts, usually at the resolu- 
tion of the point-of-presence of the ISP. Since the geographic 
location of hosts at this resolution is relatively stable within 
a session, the information can be obtained by off-the-shelf 
database lookup, eliminating any measurement overhead. 

Studies on spatial characteristics of the Internet infras- 
tructure show that Internet, like many other networks, have 
strong geographic clustering, following the geographic dis- 
tribution of its users [14, 13]. Also, like hierarchical division 
of geographic locations made for political and administra- 
tive purposes, Internet backbone can also be roughly orga- 
nized in different tiers, that serve interconnection for loca- 
tions at different levels such as continent, country, state and 
city [25]. It is true that there are many instances of different 
ISPs in the same geographic location, and the ISP's preferred 
route for geographically local traffic between two ISPs may 
not exactly follow the locality principle at fine resolution. 
Nevertheless, transporting the traffic between points of the 
same geographic location locally is arguably beneficial for 
the globally optimal use of Internet resources. The growth 
of caching proxy networks like Akamai to contain the local 
web traffic locally also supports the argument. 

Indeed, before the advent of the peer-to-peer overlays, 
there has been attempts to introduce geography-directed rout- 
ing in the Internet infrastructure [15]. Although the technique 
faced deployment hurdles in the rigid infrastructure, the ap- 
plications it envisioned only became more relevant in present 
days. Besides resource-efficient routing, the applications in- 
clude location-based information search, finding nearest ser- 
vices and broadcasting messages in a geographic connmunity. 

Recent growth of social networking platforms suggests 



that social acquaintance of network users will dictate the 
direction of majority of content transport in the Internet in 
the near future. Interestingly, a recent study shows that 
more than two thirds of the acquaintances in on-line so- 
cial networking communities are defined by geographic lo- 
cality [17]. If the general purpose routing overlay is to be 
used for future implementations of these social applications, 
the finding supports the argument that the overlay should be 
carefully optimized for geographically localized traffic. 

5 Structure of the Overlay Interconnection 

In the previous sections, we argued in favor of a location- 
based routing overlay for peer-to-peer applications and that 
geographic coordinates can serve as the basis of such 
location-based overlays. In this section, we present an over- 
lay interconnection structure based on hierarchical partition- 
ing of the geographic space, where traffic is routed towards 
the geographic location of the destination, successively re- 
solving the destination at a deeper level of the hierarchy. 

5.1 Structure 

The universe (earth surface) is hierarchically divided into 
zones, sub-zones, sub-sub-zones and so on. A zone is di- 
vided into non-overlapping sub-zones and the higher-level 
zone completely covers all the areas of its sub-zones. The 
shape of the zones need to be amenable to concise mem- 
ory representation and also to easy computation of whether 
a point belongs to a zone or not. A simple shape such as an 
axis-parallel rectangle may be used as zones. At the leaf level 
of the hierarchy are the zones that are not divided any further 
(denoted leaf zone). Each individual overlay node or peer be- 
longs to a leaf zone at its deepest level, to successively larger 
zones at higher levels, and to the zone covering the universe 
at the top level. Figure 1(a) illustrates an example division 
of the universe and the corresponding tree representation is 
shown in Figure 1(b). 

A routing table in each peer stores the overlay neighbors 
of the peer. To be able to route messages towards a desti- 
nation by successively resolving the zones at finer grain, a 
peer need to know at least one peer in all the sibling zones 
at every level of the hierarchy. At the deepest level, the peer 
knows all other peers within its own leaf zone. The routing 
table may be organized in rows, each row storing the point- 
ers to the siblings at a different level. For each pointer, the 
IP address of the target peer and the boundary definition of 
the corresponding sibling zone is stored. Also, the boundary 
definition of the self-zone at every level is stored at the corre- 
sponding row. Additionally, peers and zones can be uniquely 
identified globally, using a hierarchical name that concate- 
nates the identifications of the zones at successive levels of 
the hierarchy (as shown in the figure). Such names (denoted 
as overlay identifier) can be stored in each entry of the rout- 
ing table, besides the zone boundary definition. The overlay 
neighborhood of a peer is illustrated in Figure 1(b). 

The beauty of the overlay structure Ues in its flexibiUty to 
grow and retract with the membership dynamics, and its abil- 
ity to manage this in a completely decentralized way. When 
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Algorithm 1 Route ToAllPeers(msg, area, level) 
1: if peer .coordinate falls in area then 
2: Deliver msg to peer 
3: end if 

4: if level < deepest level d then 

5: for Each entry e in row d of the routing table do 

6: if e.coordinate falls in area then 

7: Send new RouteToAllPeers{area, d+1, msg) 

to e. IP Address 
8: end if 
9: end for 
10: end if 

11: for r = d — I down to level do 

12: for each entry e in row r of the routing table, except 

for the one denoting self zone do 
13: if e.zoneJboundary intersects area then 
14: Send new RouteTo All Peers {area, r + 1, msg) 

to e. IP Address 
15: end if 
16: end for 
17: end for 



the number of peers in a leaf zone grows beyond a threshold, 
new sub-zones are created by dividing the zone according to 
geographical clusters of peers. Note that all peers in a leaf 
zone are neighbors to each other in the overlay. So any peer, 
knowing coordinates of all other peers in the same leaf zone 
can perform the partitioning and inform all others of the new 
boundaries and identifiers. Similarly, when it is discovered 
that the number of peers in a leaf zone is below a threshold, 
the leaf zone can initiate a merge with one of its siblings. 
When new sub-zones are created based on geographic clus- 
tering, an area of the previous leaf zone that does not be- 
long to any of the clusters, is also considered a sub-zone and 
is denoted as remainder-leaf-zone. The remainder-leaf-zone 
always serves as a suitable merger siblings for the other leaf 
zones. Details of the adaptation techniques in response to 
membership dynamics can be found in [5]. 



5.2 Routing 

The overlay is able to route messages towards a geo- 
graphic location. The target may be all or any peers in the 
specified area, or the nearest or a nearby peer of a specified 
point. Messages may also be routed to a particular peer spec- 
ified by the overlay identifier 

To forward a message targeted to an area, a peer uses the 
RoutToAUPeers method defined in Algorithm 1. A peer for- 
wards the message to its contacts in all sibling zones at all 
levels of the routing table, whose zone-area intersects with 
the target area (Lines 11-17) and to all peers within the leaf 
level self-zone that fall in the target area (Lines 5-9). To re- 
member the levels of hierarchy already resolved, the level 
parameter is used, which is set to 1 at the peer that initiates 
the routing. 

Routing a message to a peer at or near a specified point 
can be performed by the same algorithm with minor modifi- 
cations. The algorithm will have a point as the third param- 
eter instead of an area. The condition in Line 13 will check 
if the point falls in the zone and the loop in Lines 11-17 will 
terminate as soon as a match is found. The loop in Lines 5-9 
will forward the message to the peer in the self zone closest 
from the target. 

If precisely the peer nearest to a specified point is sought, 
it can be done by first reaching the peer that is closest to the 
peer within the zone that holds the point, and then sending a 
query message towards a circular area with the target point at 
its center and the current peer at the perimeter, to figure out 
if any other peer closer to the target exists. 

By construction, it is observable that the overlay routing 
adheres to both the locality and the path convergence princi- 
ples outlined in Section 3. An illustration of the routing paths 
in Figure 1(a) demonstrates both the properties. 

6 Application of the Routing Overlay 

Focusing back to the origin of peer-to-peer research, i.e. 
the large scale content sharing application, we argued in Sec- 
tion 2 in favor of separation of its search and transport ser- 
vices. The generalized routing overlay is designed having 
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in mind mainly the requirements for the high-volume data 
transport to a large number of recipients. Here we explain 
how the overlay may be used for multicasting and proactive 
caching. 

A request for the content is sent to the source directly. 
Because there is a separate service for lookup, the IP address 
of the source of a desired content can be known from there. 
The routing overlay is used for efficient delivery of the con- 
tent once the request is received. Knowing the overlay iden- 
tifier of the requester, the source is able to explore the over- 
lay route towards the requester. Due to its locality and path 
convergence propoerties, this route can be used to create an 
efficient delivery tree for live streaming content, when there 
are many requests for the same content [6]. 

For non-live contents that are shared among many users 
during an extended period of time, replicas of the fragments 
of the content can be stored along the overlay route. This 
helps to serve multiple requests originated in the same geo- 
graphic locality from the nearest replica without causing un- 
necessary traffic burden on the long distance links. 

Besides multicast transport, the routing overlay can be 
used for several geography related applications, such as, find- 
ing the nearest service or looking up services in a geographic 
range from a location, or broadcasting some message or com- 
mand to the hosts in a specific geographic region [5]. 

7 Conclusion 

In this paper, we raised the issue of separating the trans- 
port from lookup in the design of overlays for peer-to-peer 
content sharing, and explained how geographic coordinates 
of the hosts can be used to create a location-based overlay 
that yields efficiency of resource usage in bulk transport. 
Whether such generalized overlay will be useful for other 
unforeseen appUcations, can only be examined in the future. 
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